This document was inspired by this post by Arthur Charpentier.
The repository with the code for creating this post is here. Just some plots so far. We may try something more elaborated (e.g., explicit comparisons of Colombia with other countries) later on.
The data file is available here.
One possible measure of inequality is entropy, the classical notion developed by Shannon for Information Theory. This is a generalized version. I am using here the one with \(\alpha=1\).
Although entropy is not necessarily a measure of variance, collections of test scores with higher entropy tend to have lower average than those with lower entropy:
By the way, why is the entropy of female scores (almost) consistently lower than that of males of the same country?
Standard deviation seems to be the most commonly used indicator of performance inequality in standarized test scores. The ranking changes drastically:
First, a violin plot of distributions of scores in math (differentiating by sex and ordered by entropy):
Empirical cummulative density function for each country in math:
And a kernel density estimate (also for math):
In his post, Charpentier compares France’s score distribution with other countries by plotting the difference of the quantiles at each level. Here I do the same for Colombia against 17 other countries with the math scores. Just as an illustration I also include Colombia versus Colombia (red is male and blue is female).
And let’s do the same with Singapore:
Serious approaches to this problem: